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Abstract 

Our goal is to identify the features that pre- 
dict the occurrence and placement of dis- 
course cues in tutorial explanations in or- 
der to aid in the automatic generation of 
explanations. Previous attempts to devise 
rules for text generation were based on in- 



generation. From the generation perspective, cue us- 
age consists of three distinct, but interrelated prob- 
lems: (1) occurrence: whether or not to include a 
cue in the generated text, (2) placement: where the 
cue should be placed in the text, and (3) selection: 
what lexical item(s) should be used. 

Prior work in text generation has focused on cue 



tuition or small numbers ot constructed ex- 
amples. We apply a machine learning pro- 
gram, C4.5, to induce decision trees for cue 



selection (McKeown and Elhadad, 1991; Elhadad 



and McKeown, 199C), or on the relation between 



occurrence and placement Irom a corpus ot 
data coded lor a variety ot features previ- 
ously thought to affect cue usage. Our ex- 
periments enable us to identify the features 
with most predictive power, and show that 
machine learning can be used to induce de- 



cue occurrence and placement and specific rhetori- 
cal structures ( Rosncr and Stcdc, 1992): Scott and 



de Souza, 199C| ; |Vandcr Linden and Martin, 1995| ) 



Other hypotheses about cue usage derive from work 
on discourse coherence and structure. Previous 
resear ch QHobbs, 1985|; |Grosz and Sidner, 1986| 



iSchiffrin, 19871; |Mann and Thompson, 1988| ; |Elhadad 



cision trees useful lor text generation. 



1 Introduction 

Discourse cues are words or phrases, such as because, 
first, and although, that mark structural and seman- 
tic relationships between discourse entities. They 
play a crucial role in many discourse processing 



and McKeown, 199C), which has been largely de- 



scriptive, suggests factors such as structural features 
of the discourse (e.g., level of embedding and seg- 
ment complexity), intentional and informational re- 
lations in that structure, ordering of relata, and syn- 
tactic form of discourse constituents. 



tasks , including plan recognit ion (Litman and Allen 
1987), text comprehension (Cohen, 1984|; |Hobbs 



1985 ; [M aim and Thompson, 1986| [Rcichman-Adar 



1984), and anaphora resolution (Grosz and Sidner 
Moreover, research in reading comprehension 
indicates that felicitous use of cues improves compre- 
hension and recall QGoldman, 1988 ), but that their 



indiscriminate use may have detrimental effects on 



recall (Millis, Graesser, and Haberlandt, 1993). 

Our goal is to identify general strategies for cue 
usage that can be implemented for automatic text 
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Moser and Moore ( |1995| ; QV997|) coded a corpus 
of naturally occurring tutorial explanations for the 
range of features identified in prior work. Because 
they were also interested in the contrast between oc- 
currence and non-occurrence of cues, they exhaus- 
tively coded for all of the factors thought to con- 
tribute to cue usage in all of the text. From their 
study, Moser and Moore identified several interesting 
correlations between particular features and specific 
aspects of cue usage, and were able to test specific 
hypotheses from the literature that were based on 
constructed examples. 

In this paper, we focus on cue occurrence and 
placement, and present an empirical study of the hy- 
potheses provided by previous research, which have 
never been systematically evaluated with naturally 
occurring data. We use a machine learning program, 
C4.5 ( Quinlan, 1993 ), on the tagged corpus of Moser 



and Moore to induce decision trees. The number of 
coded features and their interactions makes the man- 
ual construction of rules that predict cue occurrence 
and placement an intractable task. 



Our results largely confirm the suggestions from 

the literature, and clarify them by highlighting the 
most influential features for a particular task. Dis- 
course structure, in terms of both segment structure 
and levels of embedding, affects cue occurrence the 



most; intentional relations also play an important 
role. For cue placement, the most important factors 
are syntactic structure and segment complexity. 

The paper is organized as follows. In Section ^| we 
discuss previous research in more detail. Section |3| 
provides an overview of Moser and Moore's coding 
scheme. In Section ^ we present our learning exper- 
iments, and in Section [| we discuss our results and 
conclude. 

2 Related Work 



McKeown and Elhadad fllggj ; pL990| ) studied sev- 
eral connectives (e.g., but, since, because), and in- 
clude many insightful hypotheses about cue selec- 
tion; their observation that the distinction between 
but and although depends on the point of the move is 
related to the notion of core discussed below. How- 
ever, they do not address the problem of cue occur- 
rence. 



Other researchers (Rosner and Stede, 1992; Scott 
and de Souza, 199C) are concerned with generating 
text from "RST trees" , hierarchical structures where 
leaf nodes contain content and internal nodes indi- 
cate the rhetorical relations, as defined in Rhetori- 
cal S tructure Theory (RST) (|M ann and Thompson 



198S), that exist between subtrees. They proposed 



heuristics for including and choosing cues based on 
the rhetorical relation between spans of text, the or- 
der of the relata, and the complexity of the related 
text spans. However, ( Scott and de Souza, 1990| ) was 
based on a small number of constructed examples, 



and (Rosner and Stede, 1992) focused on a small 
number of RST relations. 



( [Litman, 1996| ) and QSiegel and McKeown, 1994| ) 
have applied machine learning to disambiguate be- 
tween the discourse and sentential usages of cues; 
however, they do not consider the issues of occur- 
rence and placement, and approach the problem 
from the point of view of interpretatio n. We closely 
follow the approach in ( Litman, 199€ ) in two ways. 
First, we use C4.5. Second, we experiment first with 
each feature individually, and then with "interest- 
ing" subsets of features. 



3 Relational Discourse Analysis 

This section briefly describes Relational Discourse 
Anal ysis (RDA) (|M oser, Moore, and Glendening 



1996| ), the coding scheme used to tag the data for 
our machine learning experiments.^] 

RDA is a scheme devised for analyzing tutorial ex- 
planations in the domain of electronics troubleshoot- 
ing. It synthesizes ideas from (Grosz and Sidncr 



1986| ) and from RST ( [Mann and Thompson, 1988[ ) 



Coders use RDA to exhaustively analyze each expla- 
nation in the corpus, i.e., every word in each expla- 
nation belongs to exactly one element in the anal- 
ysis. An explanation may consist of multiple seg- 
ments. Each segment originates with an intention 
of the speaker. Segments are internally structured 
and consist of a core, i.e., that element that most di- 
rectly expresses the segment purpose, and any num- 
ber of contributors, i.e. the remaining constituents. 
For each contributor, one analyzes its relation to 
the core from an intentional perspective, i.e., how 
it is intended to support the core, and from an in- 
formational perspective, i.e., how its content relates 
to that of the core. The set of intentional relations in 
RDA is a modification of the presentational relations 
of RST, while informational relations are similar to 
the subject matter relations in RST. Each segment 
constituent, both core and contributors, may itself 
be a segment with a core contributor structure. In 
some cases the core is not explicit. This is often the 
case with the whole tutor's explanation, since its 
purpose is to answer the student's explicit question. 

As an example of the application of RDA, consider 
the partial tutor explanation in The purpose of 
this segment is to inform the student that she made 
the strategy error of testing inside part3 too soon. 
The constituent that makes the purpose obvious, in 
this case (0-B), is the core of the segment. The 
other constituents help to serve the segment pur- 
pose by contributing to it. (|l|-C) is an example of 
subsegment with its own core contributor structure; 
its purpose is to give a reason for testing part2 first. 

The RDA analysis of ([!]) is shown schematically in 
Figure § The core is depicted as the mother of all 
the relations it participates in. Each relation node is 
labeled with both its intentional and informational 
relation, with the order of relata in the label indicat- 
ing the linear order in the discourse. Each relation 
node has up to two daughters: the cue, if any, and 



1 For more detail about the RDA coding schem e see 



( [Moser and Moore, 1995 



Moser and Moore. 



1997| ). 



To make the example more intelligible, we replaced 
references to parts of the circuit with the labels parti, 
part2 and part3. 



(1) 



Although 

This is 

because 

Also, 
and 



A. you know that parti is good, 

B. you should eliminate part2 
before troubleshooting inside part3. 

1. part 2 is moved frequently 
and thus 2. is more susceptible to damage than part3. 

D. it is more work to open up part3 for testing 

E. the process of opening drawers and extending cards in part3 
may induce problems which did not already exist. 



concede 
criterion: act 



Although A 



B. you should eliminate part2 
before troubleshooting inside part3 




convince 
act:reason 



(This is) 
because 



C.2 
I. 

convince 
cause:effect 



convince 
act:reason 



Also D 



C.l 



and 
thus 



convince 
act-reason 



and 



Figure 1: The RDA analysis of (ffl) 



the contributor, in the order they appear in the dis- 
course. 

Coders analyze each explanation in the corpus 
and enter their analyses into a database. The cor- 
pus consists of 854 clauses comprising 668 segments, 
for a total of 780 relations. Table [l] summarizes 
the distribution of different relations, and the num- 
ber of cued relations in each category. Joints are 
segments comprising more than one core, but no 
contributor; clusters are multiunit structures with 
no recognizable core: contributor relation. (Q-B) is 
a cluster composed of two units (the two clauses), 
related only at the informational level by a tem- 
poral relation. Both clauses describe actions, with 
the first action description embedded in a ma- 
trix ("You should"). Cues are much more likely 
to occur in clusters, where only informational re- 
lations occur, than in core: contributor structures, 
where intentional and informational relations co- 
occur (x 2 = 33.367, p <.001, df = 1). In the fol- 
lowing, we will not discuss joints and clusters any 
further. 



An important result pointed out by (Moser and 



Moore, 1995 ) is that cue placement depends on core 
position. When the core is first and a cue is asso- 
ciated with the relation, the cue never occurs with 



the core. In contrast, when the core is second, if a 
cue occurs, it can occur either on the core or on the 
contributor. 



4.1 



Learning from the corpus 
The algorithm 



We chose the C4.5 learning algorithm (Quinlan 



1993) because it is well suited to a domain such as 
ours with discrete valued attributes. Moreover, C4.5 
produces decision trees and rule sets, both often used 
in text generation to implement mappings from func- 
tion features to formsj^] Finally, C4.5 is both read- 
ily available, and is a benchmark learning algorithm 



e.g. ( 


Litman, 1996 




Mooney, 1996 




Vandcr Linden 


land Di Eugenio, 1996). 





As our dataset is small, the results we report arc 
based on cross-validation, which (Weiss and Ku 



likowski, 1991) recommends as the best method to 
evaluate decision trees on datasets whose cardinality 
is in the hundreds. Data for learning should be di- 
vided into training and test sets; however, for small 
datasets this has the disadvantage that a sizable por- 
tion of the data is not available for learning. Cross- 



We will discuss only decision trees here. 



Type of relation 


Total 


# of cued relations 


Core: Contributor 


406 


181 


Joints 


64 


19 


Clusters 


310 


276 


Total 


780 


476 



Table 1 : Distributions of relations and cue occurrences 



validation obviates this problem by running the al- 
gorithm N times (N=10 is a typical value): in each 
run, ^ A ^ 1 - ) th of the data, randomly chosen, is used 
as the training set, and the remaining -^th used as 
the test set. The error rate of a tree obtained by 
using the whole dataset for training is then assumed 



to be the average error rate on the test set over the 



N runs. Further, as C4.5 prunes the initial tree it 
obtains to avoid overfitting, it computes both ac- 
tual and estimated error rates for the pruned tree; 
see ( Quinlan, 1993 , Ch. 4) for details. Thus, below 
we will report the average estimated error rate on 
the test set, as computed by 10-fold cross-validation 
experiments. 

4.2 The features 

Each data point in our dataset corresponds to a 
core: contributor relation, and is characterized by the 
following features, summarized in Table ^. 

Segment Structure. Three features capture the 
global structure of the segment in which the current 
core: contributor relation appears. 

• (Con)Trib(utor)-pos(ition) captures the posi- 
tion of a particular contributor within the larger 
segment in which it occurs, and encodes the 
structure of the segment in terms of how many 
contributors precede and follow the core. For 
example, contributor (Q-D) in Figure [I] is la- 
beled as BlA3-2after, as it is the second con- 
tributor following the core in a segment with 1 
contributor before and 3 after the core. 

• Inten(tional)- structure indicates which contrib- 
utors in the segment bear the same intentional 
relations to the core. 

• Inf or (motional) -structure. Similar to inten- 
tional structure, but applied to informational 
relations. 

Core:contributor relation. These features more 
specifically characterize the current core: contributor 
relation. 

• Inten(tional)-rel(ation). One of concede, con- 
vince, enable. 



• Inf or (motional) -rel(ation). About 30 informa- 
tional relations have been coded for. However, 
as preliminary experiments showed that using 
them individually results in overfitting the data, 
we classify them according to the four classes 



proposed in (Moser, Moore, and Glcndening 



1996): causality, similarity, elaboration, tempo- 
ral. Temporal relations only appear in clusters, 
thus not in the data we discuss in this paper. 

• Syntactic) -rel(ation). Captures whether the 
core and contributor are independent units (seg- 
ments or sentences); whether they are coordi- 
nated clauses; or which of the two is subordi- 
nate to the other. 

• Adjacency. Whether core and contributor are 
adjacent in linear order. 

Embedding. These features capture segment em- 
bedding, Core-type and Trib-type qualitatively, and 
Above/Below quantitatively. 

• C ore-type/ ( Con) Trib(utor) -type. Whether the 
core/the contributor is a segment, or a mini- 
mal unit (further subdivided into action, state, 
matrix) . 

• Above/Below encode the number of relations hi- 
erarchically above and below the current rela- 
tion. 

4.3 The experiments 

Initially, we performed learning on all 406 instances 
of core contributor relations. We quickly determined 
that this approach would not lead to useful deci- 
sion trees. First, the trees we obtained were ex- 
tremely complex (at least 50 nodes). Second, some 
of the subtrees corresponded to clearly identifiable 
subclasses of the data, such as relations with an 
implicit core, which suggested that we should ap- 
ply learning to these independently identifiable sub- 
classes. Thus, we subdivided the data into three 
subsets: 

• Corel : core contributor relations with the core 
in first position 



feature type 


feature 


description 


Segment structure 


Trib-pos 

Inten-structure 
Infor-structure 


relative position of contrib in segment + 
number of contribs before and after core 
intentional structure of segment 
informational structure of segment 


Core: contributor 
relation 


Inten-rel 

Info-rel 

Syn-rel 

Adjacency 


enable, convince, concede 
4 classes of about 30 distinct relations 
independent sentences / segments, 
coordinated clauses, subordinated clauses 
are core and contributor adjacent? 


Embedding 


Core- type 
Trib-type 
Above / Below 


segment, minimal unit 
segment, minimal unit 
number of relations hierarchically 
above / below current relation 



Tabic 2: Features 



• Core2: core: contributor relations with the core 
in second position 

• Impl(icit)-core: core: contributor relations with 
an implicit core 

While this has the disadvantage of smaller training 
sets, the trees we obtain are more manageable and 
more meaning fill. Table | summarizes the cardinal- 
ity of these sets, and the frequencies of cue occur- 
rence. 

We ran four sets of experiments. In three of them 
we predict cue occurrence and in one cue place- 
ment.^] 

4.3.1 Cue Occurrence 

Table ^ summarizes our main results concerning 
cue occurrence, and includes the error rates asso- 
ciated with different feature sets. We adopt Lit- 
man's approach ( |1996| ) to determine whether two 
error rates £ i and £2 are significantly different. We 
compute 95% confidence intervals for the two error 
rates using a f-test. £ 1 is significantly better than 
£ 2 if the upper bound of the 95% confidence inter- 
val for £1 is lower than the lower bound of the 95% 
confidence interval for £ 2. 

For each set of experiments, we report the following: 

1. A baseline measure obtained by choosing the 
majority class. E.g., for Corel 58.9% of the re- 
lations are not cued; thus, by deciding to never 
include a cue, one would be wrong 41.1% of the 
times. 

4 A11 our experiments are run with grouping turned on, 
so that C4.5 groups values together rather than creating 
a branch per value. The latter choice always results in 
trees overfitted to the data in our domain. Using classes 
of informational relations, rather than individual infor- 
mational relations, constitutes a sort of a priori grouping. 



2. The best individual features whose predictive 
power is better than the baseline: as Table |] 
makes apparent, individual features do not have 
much predictive power. For neither Corel nor 
Impl-core does any individual feature perform 
better than the baseline, and for Core2 only one 
feature is sufficiently predictive. 

3. (One of) the best induced tree(s). For each 
tree, we list the number of nodes, and up to 
six of the features that appear highest in the 
tree, with their levels of embedding]^] Figure ^ 
shows the tree for Core2 (space constraints pre- 
vent us from including figures for each tree). In 
the figure, the numbers in parentheses indicate 
the number of cases correctly covered by the 
leaf, and the number of expected errors at that 
leaf. 

Learning turns out to be most useful for Corel, 
where the error reduction (as percentage) from base- 
line to the upper bound of the best result is 32%; 
error reduction is 19% for Core2 and only 3% for 
Impl-core. 

The best tree was obtained partly by informed 
choice, partly by trial and error. Automatically try- 
ing out all the 2 11 = 2048 subsets of features would 
be possible, but it would require manual examina- 
tion of about 2,000 sets of results, a daunting task. 
Thus, for each dataset we considered only the fol- 
lowing subsets of features. 

1. All features. This always results in C4.5 select- 
ing a few features (from 3 to 7) for the final 
tree. 

2. Subsets built out of the 2 to 4 attributes ap- 
pearing highest in the tree obtained by running 

The trees that C4.5 generates are right-branching, 
so this description is fairly adequate. 



Dataset 


# of relations 


# of cued relations 


Corel 


127 


52 


Core2 


155 


100 

(on Trib: 43) (on Core: 57) 


Impl-core 


124 


29 


Total 


406 


181 



Table 3: Distributions of relations and cue occurrences 





Corel 


Core2 


Impl-core 


Baseline 


41.1 


35.4 


23.5 


Best features 





Info-rel: 33.4±0.94 





Best tree 


25.6±1.24 (15) 

0. Trib-pos 

1. Trib-type 

2. Syn-rel 

3. Core- type 

4. Above 

5. Inten-rel 


27.4il.28 (18) 

0. Trib-Pos 

1. Inten-rel 

2. Info-rel 

3. Above 

4. Core-type 

5. Below 


22.1±0.57 (10) 

0. Core-type 

1. Infor-struct 

2. Inten-rel 



Table 4: Summary of learning results 



C4.5 on all features. 

3. In Table [^, three features — Trib-pos, Inten- 
struct, Infor-struct — concern segment struc- 
ture, eight do not. We constructed three sub- 
sets by always including the eight features that 
do not concern segment structure, and adding 
one of those that does. The trees obtained by 
including Trib-pos, Inten-struct, Infor-struct at 
the same time are in general more complex, 
and not significantly better than other trees ob- 
tained by including only one of these three fea- 
tures. We attribute this to the fact that these 
features encode partly overlapping information. 

Finally, the best tree was obtained as follows. We 
build the set of trees that are statistically equivalent 
to the tree with the best error rate (i.e., with the 
lowest error rate upper bound). Among these trees, 
we choose the one that we deem the most perspicu- 
ous in terms of features and of complexity. Namely, 
we pick the simplest tree with Trib-Pos as the root 
if one exists, otherwise the simplest tree. Trees that 
have Trib-Pos as the root are the most useful for 
text generation, because, given a complex segment, 
Trib-Pos is the only attribute that unambiguously 
identifies a specific contributor. 

Our results make apparent that the structure of 
segments plays a fundamental role in determining 
cue occurrence. One of the three features concerning 



segment structure ( Trib-Pos, Inten- Structure, Infor- 
Structure) appears as the root or just below the root 
in all trees in Table ^; more importantly, this same 
configuration occurs in all trees equivalent to the 
best tree (even if the specific feature encoding seg- 
ment structure may change) . The level of embedding 
in a segment, as encoded by Core-type, Trib-type, 
Above and Below also figures prominently. 

Inten-rel appears in all trees, confirming the in- 
tuition that the speaker's purpose affects cue occur- 
rence. More specifically, in Figure ^, Inten-rel dis- 
tinguishes two different speaker purposes, convince 
and enable. The same split occurs in some of the 
best trees induced on Corel, with the same outcome: 
i.e., convince directly correlates with the occurrence 
of a cue, whereas for enable other features must be 
taken into account .[] Informational relations do not 
appear as often as intentional relations; their dis- 
criminatory power seems more relevant for clusters. 
Preliminary experiments show that cue occurrence 
in clusters depends only on informational and syn- 
tactic relations. Finally, Adjacency does not seem to 
play any substantial role. 



We can't draw any conclusions concerning concede, 
as there are only 24 occurrences of concede out of 406 
core: contributor relations. 



(BlA4-lpre, 
B2A2-2pre, 
B3AO-3pre 

(4/1.2) 

No-Cue 



B 1 AO- 1 pre,B 1 A 1 - 1 pre.B 1 A2- 1 pre,B 1 A3- 1 pre, 
B2A0- 1 pre,B2A0-2pre, 
B2Al-lpre,B2Al-2pre 
B 3 AO- 1 pre,B3 A0-2pre ) 



{ convince, concede } 
(70/12.7) 





{BlAl-lpre,BlA2-lpre, 
BlA3-lpre, 
B2A0- 1 pre,B2A0-2pre, 
B2Al-lpre,B2Al-2pre 
B 3 AO- 1 pre,B3 A0-2pre } 
(15/3.3) 



BlAl-lpre,BlA2-lpre, 
BlA3-lpre,B2A0-lpre, 
B 2A 1 - 1 pre ,B 2 A 1 -2pre 
B3A0-lpre,B3A0-2pre} 

(7/3.3) 



No-Cue 



Cue 



Cue 



No-Cue 



Figure 2: Decision tree for Core2 — occurrence 



4.3.2 Cue Placement 

While cue occurrence and placement are interre- 
lated problems, we performed learning on them sep- 
arately. First, the issue of placement arises only in 
the case of Core2; for Corel, cues only occur on the 
contributor. Second, we attempted experiments on 
Core2 that discriminated between occurrence and 
placement at the same time, and the derived trees 
were complex and not perspicuous. Thus, we ran 
an experiment on the 100 cued relations from Core2 
to investigate which factors affect placing the cue 
on the contributor in first position or on the core in 
second; see Table § 

We ran the same trials discussed above on this 
dataset. In this case, the best tree — see Figure || 
- results from combining the two best individual 
features, and reduces the error rate by 50%. The 
most discriminant feature turns out to be the syn- 
tactic relation between the contributor and the core. 
However, segment structure still plays an important 
role, via Trib-pos. 



Baseline 


43% 


Best features 


Syn-rel: 24.1±0.69 
Trib-pos: 40±0.88 


Best tree 


20.6±0.97 (5) 

0. Syn-rel 

1. Trib-pos 



Table 5: Cue placement on Core2 



While the importance of Syn-rel for placement 
seems clear, its role concerning occurrence requires 
further exploration. It is interesting to note that the 
tree induced on Corel — the only case in which Syn- 
rel is relevant for occurrence — includes the same 
distinction as in Figure ^[ namely, if the contrib- 
utor depends on the core, the contributor must be 
marked, otherwise other features have to be taken 
into account. Scott and de Souza (1990) point out 



that "there is a strong correlation between the syn- 
tactic specification of a complex sentence and its per- 
ceived rhetorical structure." It seems that certain 



12d: Trib depends on Core 
2 Id: Core depends on Trib 



{12d} 

(26/2.6) 



Cue-on-Trib 



{BlAO-lpre,BlAl-lpre, 
BlA2-lpre,BlA3-lpre, 
B2A0-2pre,B2Al-2pre,B2A2-lpre, 
B3A0-lpre} (fi 



Cue-on-Core 




ic: Core and Trib are independent clauses 
cc,cp,ci: Core and Trib are coordinated 
phrases 



B2A0-lpre,B2Al-lpre, 
B3A0-2pre} 

(13/5.7) 



Cue-on-Trib 



Figure 3: Decision tree for Core2 — placement 



syntactic structures function as a cue. 

5 Discussion and Conclusions 

We have presented the results of machine learning 
experiment s concerning cue occurrence and place- 
ment. As ( Litman, 1996 ) observes, this sort of em- 
pirical work supports the utility of machine learning 
techniques applied to coded corpora. As our study 
shows, individual features have no predictive power 
for cue occurrence. Moreover, it is hard to see how 
the best combination of individual features could be 
found by manual inspection. 

Our results also provide guidance for those build- 



ing text generation systems. This study clearly in- 



dicates that segment structure, most notably the 
ordering of core and contributor, is crucial for de- 
termining cue occurrence. Recall that it was only 
by considering Corel and Core2 relations in distinct 
datasets that we were able to obtain perspicuous de- 
cision trees that significantly reduce the error rate. 

This indicates that the representations produced 
by discourse planners should distinguish those ele- 
ments that constitute the core of each discourse seg- 
ment, in addition to representing the hierarchical 
structure of segments. Note that the notion of core 
is related to the notions of nucleus in RST, intended 



effect in (Young and Moore, 1994), and of point of 
a move in (Elhadad and McKcown, 1990), and that 
text generators representing these notions exist. 

Moreover, in order to use the decision trees de- 
rived here, decisions about whether or not to make 
the core explicit and how to order the core and con- 



tributor^) must be made before deciding cue occur- 
rence, e.g., by exploiting other factors such as focus 
(McKeown, 1985) and a discourse history. 



Once decisions about core: contributor ordering 
and cue occurrence have been made, a generator 
must still determine where to place cues and se- 
lect appropriate lexical items. A major focus of 
our future research is to explore the relationship be- 
tween the selection and placement decisions. Else- 
where, we have found that particular lexical items 
tend to have a preferred location, defined in terms 
of functional (i.e., core or contributor) and linear 
(i.e., first or second relatum) criteria (Moser and 



Moore, 1997 ). Thus, if a generator uses decision 
trees such as the one shown in Figure || to deter- 
mine where a cue should be placed, it can then se- 
lect an appropriate cue from those that can mark 
the given intentional / informational relations, and 
are usually placed in that functional-linear location. 
To evaluate this strategy, we must do further work 
to understand whether there are important distinc- 
tions among cues (e.g., so, because) apart from their 
different preferred locations. The work of Elhadad 
( |1990D and Knott ( [L996D wil1 hel P 

in answering this 

question. 

Future work comprises further probing into ma- 
chine learning techniques, in particular investigating 
whether other learning algorithms are more appro- 
priate for our problem ( Mooney, 1996 ), especially al- 
gorithms that take into account some a priori knowl- 
edge about features and their dependencies. 
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